Empirical Observations of Probabilistic Heuristics for the Clustering Problem
نویسنده
چکیده
We empirically investigate a number of strategies for solving the clustering problem under the minimum variance error criterion. First, we compare the behavior of four algorithms, 1) randomized minimum spanning tree, 2) hierarchical grouping, 3) randomized maximum cut, and 4) standard k-means. We test these algorithms with a large corpus of both contrived and real-world data sets and nd that standard k-means performs best. We found, however, that standard k-means can, with non-negligible probability, do a poor job optimizing the minimum variance criterion. We therefore investigate various randomized k-means modiications. We empirically nd that by running randomized k-means only a modest number of times, the probability of a poor solution becomes negligible. Using a large number of CPU hours to experimentally derive the apparently optimal solutions, we also nd that randomized k-means has the best rate of convergence to this apparent optimum.
منابع مشابه
زمانبندی دو معیاره در محیط جریان کاری ترکیبی با ماشینهای غیر یکسان
This study considers scheduling in Hybrid flow shop environment with unrelated parallel machines for minimizing mean of job's tardiness and mean of job's completion times. This problem does not study in the literature, so far. Flexible flow shop environment is applicable in various industries such as wire and spring manufacturing, electronic industries and production lines. After modeling the p...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملRanking Pharmaceutics Industry Using SD-Heuristics Approach
In recent years stock exchange has become one of the most attractive and growing businesses in respect of investment and profitability. But applying a scientific approach in this field is really troublesome because of variety and complexity of decision making factors in the field. This paper tries to deliver a new solution for portfolio selection based on multi criteria decision making literatu...
متن کاملApplication of Probabilistic Clustering Algorithms to Determine Mineralization Areas in Regional-Scale Exploration Studies
In this work, we aim to identify the mineralization areas for the next exploration phases. Thus, the probabilistic clustering algorithms due to the use of appropriate measures, the possibility of working with datasets with missing values, and the lack of trapping in local optimal are used to determine the multi-element geochemical anomalies. Four probabilistic clustering algorithms, namely PHC,...
متن کاملPersian Handwritten Digit Recognition Using Particle Swarm Probabilistic Neural Network
Handwritten digit recognition can be categorized as a classification problem. Probabilistic Neural Network (PNN) is one of the most effective and useful classifiers, which works based on Bayesian rule. In this paper, in order to recognize Persian (Farsi) handwritten digit recognition, a combination of intelligent clustering method and PNN has been utilized. Hoda database, which includes 80000 P...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997